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ABSTRACT 

Frequent Episode Discovery framework is a popular frame- 
work in Temporal Data Mining with many applications. 
Over the years many different notions of frequencies of episodes 
have been proposed along with different algorithms for episode 
discovery. In this paper we present a unified view of all such 
frequency counting algorithms. We present a generic algo- 
rithm such that all current algorithms are special cases of 
it. This unified view allows one to gain insights into dif- 
ferent frequencies and we present quantitative relationships 
among different frequencies. Our unified view also helps 
in obtaining correctness proofs for various algorithms as we 
show here. We also point out how this unified view helps us 
to consider generalization of the algorithm so that they can 
discover episodes with general partial orders. 

1. INTRODUCTION 

Temporal data mining is concerned with finitely many use- 
ful patterns in sequential (symbolic) data streams [16]. Fre- 
quent episode discovery, first introduced in Q3] , is a popular 
framework for mining patterns from sequential data. The 
framework has been successfully used in many application 
domains, e.g., analysis of alarm sequences in telecommuni- 
cation networks 14 , root cause diagnostics from faults log 
data in manufacturing [22], user-behavior prediction from 
web interaction logs [11], inferring functional connectivity 
from multi-neuronal spike train data [19] , relating finan- 
cial events and stock trends [17] , protein sequence classifica- 
tion [2], intrusion detection [121123] , text mining |7j, seismic 
data analysis [15] etc. The data in this framework is a sin- 
gle long stream of events, where each event is described by 
a symbolic event-type from a finite alphabet and the time of 
occurrence of the event. The patterns of interest are termed 
episodes. Informally, an episode is a short ordered sequence 
of event types, and a frequent episode is one that occurs of- 
ten enough in the given data sequence. Discovering frequent 
episodes is a good way to unearth temporal correlations in 
the data. Given a user-defined frequency threshold, the task 
is to efficiently obtain all frequent episodes in the data se- 
quence. 

An important design choice in frequent episode discovery 
is the definition of frequency of episodes. Intuitively any 
frequency should capture the notion of the episode occur- 
ring many times in the data and, at the same time, should 



have an efficient algorithm for computing the same. There 
are many ways to define frequency and this has given rise 
to different algorithms for frequent episode discovery [3][6]- 
I8l ll3!fl5] . In the original framework of [14] . frequency was 
defined as the number of fixed-width sliding windows over 
the data that contain at least one occurrence of the episode. 
Another notion for frequency is based on the number of min- 
imal occurrences [131114] . Two frequency definitions called 
head frequency and total frequency are proposed in [7] in 
order to overcome some limitations of the windows-based 
frequency of [14] . In [8], two more frequency definitions for 
episodes were proposed, based on certain specialized sets of 
occurrences of episodes in the data. 

Many of the algorithms, such as the WINEPI of [14] and the 
occurrences-based frequency counting algorithms of [9lll0], 
employ finite state automata as the basic building blocks for 
recognizing occurrences of episodes in the data sequence. An 
automata-based counting scheme for minimal occurrences 
has also been proposed in [4]. 

The multiplicity of frequency definitions and the associated 
algorithms for frequent episode discovery makes it difficult 
to compare the different methods. In this paper, we present 
a unified view of algorithms for frequent episode discovery 
under all the various frequency definitions . We present a 
generic automata-based algorithm for obtaining frequencies 
of a set of episodes and show that all the currently available 
algorithms can be obtained as special cases of this method. 
This viewpoint helps in obtaining useful insights regarding 
the kinds of occurrences tracked by the different algorithms. 
The framework also aids in deriving proofs of correctness 
for the various counting algorithms, many of which are not 
currently available in literature. Our framework also helps 
in understanding the anti-monotonicity conditions satisfied 
by different frequencies which is needed for the candidate 
generation step. Our general view can also help in general- 
izing current algorithms, which can discover only serial or 
parallel episodes, to the case of episodes with general partial 
orders and we briefly comment on this in our conclusions. 
The paper is organized as follows. Sec. [2] gives an overview 
of the episode framework and explains all the currently used 
frequencies in literature. Sec. [3] presents our generic algo- 
rithm and shows that all current counting techniques for 
these various frequencies can be derived as special cases. 
Sec. [4] gives proofs of correctness for the various counting 
algorithms utilizing this unified framework. Sec. [5] discusses 
the candidate generation step for all these frequencies. In 
Sec. [6] we provide some discussion and concluding remarks. 



2. AN OVERVIEW OF FREQUENT EPISODE 
DISCOVERY 

In this section we briefly review the framework of frequent 
episode discovery [14]. The data, referred to as an event 
sequence, is denoted by D = ((Ex,ti), {E 2 ,t 2 ), . . . (E n , t n )}, 
where each pair (Ei,ti) represents an event, and the number 
of events in the event sequence is n. Each Ei is a symbol 
(or event-type) from a finite alphabet, S, and U is a positive 
integer representing the time of occurrence of the i th event. 

The sequence is ordered so that, ti < i;+i for all i — 1, 2, 

The following is an example event sequence with 10 events: 

(A, 1), (A, 2), (5,3), (A, 6), (A, 7), (C,8), (B, 9), (D, 11), 

(C, 12), (4, 13), (B, 14), (C, 15) (1) 

An JV-node episode, a, is defined as a triple, (V a , < a , g a ), 
where V a — {v\,v 2 , . . .ujv}, is a collection of N nodes, < a 
is a partial order on V a and g Q : V a —¥ £ is a map that 
associates each node in a with an event type from £. Thus 
an episode is a (typically small) collection of event-types 
along with an associated partial order. When the order < a 
is total, a is called a serial episode, and when the order 
is empty, a is called a parallel episode. In this paper, we 
restrict our attention to serial episode^]. Without loss of 
generality, we can now assume that the total order on the 
nodes of a is given by Vi < a v 2 < a ... < Q vn- For example, 
consider a 3-node episode V a — {vi,v 2 ,V3}, g a (vi) = A, 
g a {v 2 ) — B, g a {vs) — C, with v\ < a v 2 < a V3. We denote 
such an episode by (^4 — > B — > C). An occurrence of episode 
a in an event sequence D is a map h : V a — > {1, . . . , n} such 
that g a (v) = E^m for all v 6 V a , and for all v, w £ V a 
with v < a w we have t h ( v ) < tu-w)- in the example event 
sequence (JTJ, the events (A, 2), (B,3) and (C, 8) constitute 
an occurrence of (A — > B — > C) while (-8,3), (^4,7) and 
(C, 8) do not. We use a[i] to refer to the i th event-type in 
a. This way, an iV-node episode a can be represented using 
(a[l] — > q[2] — > ... — > a[iV]). An episode j3 is said to be 
a subepisode of a (denoted ft ^ a) if all the event-types in 
j3 also appear in a, and if their order in j3 is same as that 
in a. For example, (A — !> C) is a 2-node subepisode of the 
episode (A — > B — > C) while (B — > A) is not. 
The frequency of an episode is some measure of how often 
it occurs in the event sequence. A frequent episode is one 
whose frequency exceeds a user-defined threshold. The task 
in frequent episode discovery is to find all frequent episodes. 
Given an occurrence h of an iV-node episode a, {th(v N ) — 
th( vi )) is called the span of the occurrence. In many appli- 
cations, one may want to consider only those occurrences 
whose span is below some user-chosen limit. (This is be- 
cause, occurrences constituted by events that are widely sep- 
arated in time may not represent any underlying causative 
influences). We call any such constraint on span as an 
expiry-time constraint. The constraint is specified by a thresh- 
old, Tx, such that occurrences of episodes whose span is 
greater than Tx are not considered while counting the fre- 
quency. 

One popular approach to frequent episode discovery is to 
use an Apriori-style level-wise procedure. At level k of the 
procedure, a 'candidate generation' step combines frequent 
episodes of size (k — 1) to build candidates (or potential fre- 
quent episodes) of size k using some kind of anti-monotonicity 

1 From now on, we will simply use 'episode' to refer to a 
serial episode. 



property (e.g. frequency of an episode cannot exceed fre- 
quency of any of its subepisodes). The second step at level k 
is called 'frequency counting' in which, the algorithm counts 
or computes the frequencies of the candidates and deter- 
mines which of them are frequent. 

2.1 Frequencies of episodes 

There are many ways to define the frequency of an episode. 
Intuitively, any definition must capture some notion of how 
often the episode occurs in the data. It must also admit 
an efficient algorithm to obtain the frequencies for a set of 
episodes. Further, to be able to apply a level-wise proce- 
dure, we need the frequency definition to satisfy some anti- 
monotonicity criterion. Additionally, we would also like the 
frequency definition to be conducive to statistical signifi- 
cance analysis. 

In this section, we discuss various frequency definitions that 
have been proposed in literature. (Recall that the data is 
an event sequence, D = ((E\,t\), . . . (E n ,t n )))- 

Definition 1. \12fl A window on an event sequence, D, 
is a time interval [t 3 , t e ], where t s and t e are positive integers 
such that t s < t n and t e >t\. The window width of \t B ,t e ] 
is given by (t e — t s ). Given a user-defined window width 
Tx, the windows-based frequency of a is the number of 
windows of width Tx which contain at least one occurrence 
of a. 

For example, in the event sequence ([T]l. there are 5 windows 
with window width 5 which contain an occurrence of (A — > 
B^C). 

Definition 2. \1J$ The time-window of an occurrence, 
h, of a is given by [th(v 1 ),th( VN )]- A minimal window of a 
is a time-window which contains an occurrence of a, such 
that no proper sub-window of it contains an occurrence of 
a. An occurrence in a minimal window is called a minimal 
occurrence. The minimal occurrences-based frequency 
of a in V) (denoted fm%) is defined as the number of minimal 
windows of a in D. 

In the example sequence {1} there are 3 minimal windows of 

{A -> B ->• C): [2, 8], [7, 12] and [13, 15]. 

Definition 3. [Tj' Given a window-width k, the head 
frequency of a is the number of windows of width k which 
contain an occurrence of a starting at the left-end of the 
window and is denoted as fh{ct,k). 

Definition 4. /TV Given a window width k, the total 
frequency of a, denoted as f to t(ct,k), is defined as follows. 



ftot{ce,k) 



mm f h {fi,k) 



(2) 



For a window- width of 6, the head frequency /h(7,6) of 
7 = (A — )> B — > C) in fl} is 4. The total frequency of 7, 
ftot (7, k) , in ([l]) is 3 because the head frequency of (B — > C) 
in JTJ) is 3. 

Definition 5. |^ Two occurrences h\ and h 2 of a are 
said to be non-overlapped if either ih 1 ( UJV ) < £h 2 («i) or ^h 2 {v N ) < 
thx(vi)- A set of occurrences is said to be non- overlapped if 
every pair of occurrences in the set is non-overlapped. A 



set H , of non- overlapped occurrences of a in D is maximal 
if \H\ > \H'\, where H' is any other set of non-overlapped 
occurrences of a in D. The non-overlapped frequency 
of a in D (denoted as f no ) is defined as the cardinality of a 
maximal non- overlapped set of occurrences of a in D. 

Two occurrences are non-overlapped if no event of one oc- 
currence appears in between events of the other. The notion 
of a maximal non-overlapped set is needed since there can be 
many sets of non-overlapped occurrences of an episode with 
different cardinality 8 . The non-overlapped frequency of 7 
in is 2. A maximal set of non-overlapped occurrences is 
((A, 2), (73, 3), (C, 8)) and ((A, 13), (73, 14), (C, 15)). 

Definition 6. JBJ Two occurrences hi and hi of a are 
said to be non-interleaved if either t h2 ( v .) > t hl ( v ^, j — 



1,2, 



.TV 



1 or t 



hiivj) 



> t 



h 2 (v j+1 ), j = 1,2, ...TV 



A set of occurrences H of a in D is non-interleaved if ev- 
ery pair of occurrences in the set is non-interleaved. A set 
H of non-interleaved occurrences of a in D is maximal if 
\H\ > \H'\, where H' is any other set of non-interleaved 
occurrences of a in IB. The non-interleaved frequency 
of a in H (denoted as f n i) is defined as the cardinality of a 
maximal non-interleaved set of occurrences of a in D. 

The occurrences ({A, 2), (73, 3), (C, 8)) and {(A, 7), (B, 9)(C, 12)) 
are non-interleaved (though overlapped) occurrences of (A — ¥ 
B -> C) in D. Together with {(A, 13), (73, 14), (C, 15)), these 
two occurrences form a set of maximal non-interleaved oc- 
currences of (A — > B — > C) in {TJ and thus f n i — 3. 

Definition 7. '8] Two occurrences hi and /12 of a are 
said to be distinct if they do not share any two events. A 
set of occurrences is distinct if every pair of occurrences in 
it is distinct. A set H of distinct occurrences of a in D is 
maximal j/|7-7| > \H'\, where H' is any other set of distinct 
occurrences of a in IB. The distinct occurrences-based 
frequency of a in IS) (denoted as fd) is the cardinality of a 
maximal distinct set of occurrences of a in D. 

The three occurrences that constituted the maximal non- 
interleaved occurrences of (A — > B — > C) in |T} also form a 
set of maximal distinct occurrences in {TJ . 

The first frequency proposed in the literature was the win- 
dows based count [TJ] and was originally applied for an- 
alyzing alarms in a telecommunication network. It uses 
an automata based algorithm called WINEPI for count- 
ing. Candidate generation exploits the anti-monotonicity 
property that all subepisodes are at least as frequent as 
the parent episode. A statistical significance test for fre- 
quent episodes based on the windows-based count was pro- 
posed in [5]. There is also an algorithm for discovering fre- 
quent episodes with a maximum-gap constraint under the 
windows-based count [3]- 

The minimal windows based frequency and a level- wise pro- 
cedure called MINEPI to track minimal windows were also 
proposed in [14] . This algorithm has high space complexity 
since the exact locations of all the minimal windows of the 
various episodes are kept in memory. Nevertheless, it is use- 
ful in rule generation. An efficient automata-based scheme 
for counting the number of minimal windows (along with 
a proof of correctness) was proposed in [3J. The problem 
of statistical significance of minimal windows was recently 



addressed in [21]. An algorithm for extracting rules under a 
maximal gap constraint and based on minimal occurrences 
has been proposed in |15| . 

In the windows-based frequency, the window width is es- 
sentially an expiry-time constraint (an upper-bound on the 
span of the episodes). However, if the span of an occur- 
rence is much smaller than the window width, then its fre- 
quency is artificially inflated because the same occurrence 
will be found in several successive sliding windows. The 
head frequency measure, proposed in [7], is a variant of the 
windows-based count intended to overcome this problem. 
Based on the notion of head frequency, [6] presents two al- 
gorithms MINEPI+ and EMMA. They also point out how 
head frequency can be a better choice for rule generation 
compared to the windows-based or the minimal windows- 
based counts. Under the head frequency count, however, 
there can be episodes whose frequency is higher than some of 
their subepisodes (see [TJ for details). To circumvent this, [TJ 
propose the idea of total frequency. Currently, there is no 
statistical significance analysis based on head frequency or 
total frequency. 

An efficient automata-based counting algorithm under the 
non-overlapped frequency measure (along with a proof of 
correctness) can be found in 10 . A statistical significance 
test for the same is proposed in [9_. However, the algo- 
rithm in [10] does not handle any expiry-time constraints. 
An efficient automata-based algorithm for counting non- 
overlapped occurrences under expiry-time constraint was pro- 
posed in [SJ[9] though this has higher time and space com- 
plexity than the algorithm in [TO]. No proofs of correct- 
ness or statistical significance analysis are available for non- 
overlapped occurrences under an expiry-time constraint. Al- 
gorithms for frequent episode discovery under the non-interleaved 
frequency can be found in [8\ No proofs of correctness are 
available for these algorithms. 

Another frequency measure we discuss in this paper is based 
on the idea of distinct occurrences. No algorithms are avail- 
able for counting frequencies under this measure. The uni- 
fied view of automata-based counting that we will present 
in this paper can be readily used to design algorithms for 
counting distinct occurrences of episodes. 

3. UNIFIED VIEW OF ALL THE AUTOMATA 
BASED ALGORITHMS 

In this section, we present a generic algorithm for obtaining 
frequencies of episodes under the different frequency defi- 
nitions listed in Sec. 12.11 The basic ingredient in all the 
algorithms is a simple Finite State Automaton (FSA) that 
is used to recognize (or track) an episode's occurrences in 
the event sequence. 

The FSA for recognizing occurrences of (A — > 73 — 5- C) is 
illustrated in Fig. ffj In general, an FSA for an TV- node serial 
episode a — q[1] — > ct[2] —>...-» a [TV] has (TV + 1) states. 
The first TV states are represented by a pair (i,a[i + 1]), 
i = 0, . . . TV - 1. The (TV + l) th state is (TV, 0) where is a 
null symbol. Intuitively, if the FSA is in state (j, a[j + 1]), 
it means that the FSA has already seen the first j event 
types of this episode and is now waiting for a[j + 1]; if we 
now encounter an event of type a\j + 1] in the data it can 
accept it (that is, it can transit to its next state). The start 
(first) state of the FSA is (0,q[1]). The (TV + l) th state is 
the accepting state because when an automaton reaches this 
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Figure 1: Automaton for tracking occurrences of a = (A - 
B^C) 



state, a full occurrence of the episode is tracked. 
We first explain how these FSA can be used for obtaining 
all the different types of frequencies of episodes before pre- 
senting the generic algorithm. While discussing various algo- 
rithms, we represent any occurrence h by [th(v ± ),th(m) ■ ■ ■ th(v N )], 
which is the vector of times of the events that constitute 
the occurrence. For the discussion of all algorithms in this 
section, we consider the example of tracking occurrences of 
a = (A — > B — > C — > D) in the data stream Dj given by 

Di = (A,l)(B,3)(A,4)(A,5)(C,7)(B,9)(C,ll)(A,U) 
(D, 15)(C, 16) (B, 17) (D, 18) (A, 19) (C, 20) (B, 21) 
(A, 22) (D, 23) (S, 24) (C, 25) (D, 29) (C, 30) (D, 31) 

There is a 'natural' lexicographic order on the set of all 
occurrences Tt, of any episode, a, defined below. This is a 
total order on H. and it will be useful in our analysis. 

Definition 8. The lexicographic ordering, <*, on the set 
of all occurrences of a is defined as follows: hi <* h% if 
the least i for which t^tv) 7^ th^iv) ^ s such that t^fv) < 

th 2 (vi)- 

The simplest of all automata-based frequency counting al- 
gorithms is the one for counting non-overlapped occurrences 
[ID] which uses only 1-automata per episode. (We call it al- 
gorithm NO here). At the start, one automaton for each of 
the candidate episodes is initialized in its start state. Each 
of the automata make a state transition as soon as a rele- 
vant event-type appears in the data stream. Whenever an 
automaton reaches its final state, frequency of the corre- 
sponding episode is incremented, the automaton is removed 
from the system and a fresh automaton for the episode is 
initialized in the start state. As is easy to see, this method 
will count non-overlapped occurrences of episodes. Under 
the NO algorithm, we denote the occurrence tracked by the 
i th automaton initialized for a as hf°. 

In our example, algorithm NO tracks the following two oc- 
currences of the episode a: (i) h"° = [13 7 15] and (ii) 
fc™ = [19 2125 29], and the corresponding non-overlapped 
frequency is 2. 

In this paper we introduce the concept of earliest transit- 
ing occurrence of an episode which is useful for analyzing 
different frequency counting algorithms. 

Definition 9. An occurrence h of a is called earliest 
transiting if E h ( v .^ is the first occurrence of a[i] after th( Vi _ t ) 
Mi = 2, 3 . . . N. 

It is easy to see that all occurrences tracked by algorithm NO 
are earliest transiting. Let W denote the set of all earliest 
transiting occurrences of a given episode. We denote the i th 
occurrence (as per the lexicographic ordering of occurrences) 



in H e as h\. There are 6 earliest transiting occurrences of 
a in Di. They are h\ = [13715], h\ = [491115], h% = 
[5 91115], h% = [1417 20 23], h% = [19 2125 29] and h% = 
[22 24 25 29] . The earliest transiting occurrences tracked by 
the NO algorithm are h\° = h\ and h™ = h%. 
While the algorithm NO is very simple and efficient, it can 
not handle any expiry-time constraint. Recall that the expiry- 
time constraint specifies an upper-bound, Tx, on the span of 
any occurrence that is counted. Suppose we want to count 
with Tx = 9. Both the occurrences tracked by NO have 
spans greater than 9 and hence the resulting frequency count 
would be zero. However, h% is an occurrence which satisfies 
the expiry time constraint. Algorithm NO can not track 
h% because it uses only one automaton per episode and the 
automaton has to make a state transition as soon as the 
relevant event-type appears in the data. To overcome this 
limitation, the algorithm can be modified so that a new au- 
tomaton is initialized in the start state, whenever an existing 
automaton moves out of its start state. All automata make 
state transitions as soon as they are possible. Each such 
automaton would track an earliest transiting occurrence. In 
this process, two automata may reach the same state. In our 
example, after seeing (A, 5), the second and third automata 
to be initialized for a, would be waiting in the same state 
(ready to accept the next B in the data). Clearly, both au- 
tomata will make state transitions on the same events from 
now on and so we need to keep only one of them. We re- 
tain the newer or most recently initialized automaton (in 
this case, the third automaton) since the span of the occur- 
rence tracked by it would be smaller. When an automaton 
reaches its final state, if the span of the occurrence tracked 
by it is less than Tx, then the corresponding frequency is 
incremented and all automata of the episode except the one 
waiting in the start state are retired. (This ensures we are 
tracking only non-overlapped occurrences). When the oc- 
currence tracked by the automaton that reaches the final 
state fails the expiry constraint, we just retire the current 
automaton; any other automata for the episode will continue 
to accept events. Under this modified algorithm, in Di, the 
first automaton that reaches its final state tracks h% which 
violates the expiry time constraint of Tx — 9- So, we drop 
only this automaton. The next automaton that reaches its 
final state tracks h%. This occurrence has span less than 
Tx = 9. Hence we increment the corresponding frequency 
count and retire all current automata for this episode. Since 
there are no other occurrences non-overlapped with h%, the 
final frequency would be 1. We denote this algorithm for 
counting the non-overlapped occurrences under an expiry- 
time constraint as NO-X. The occurrences tracked by both 
NO and NO-X would be earliest transiting. 
Note that several earliest transiting occurrences may end si- 
multaneously. For example, in Di, h\, h% and h% all end 
together at (D, 15). Both {h^hf,} and {h%,,h%} form max- 
imal sets of non-overlapped occurrences. Sometimes (e.g. 
when determining the distribution of spans of occurrences 
for an episode) we would like to track the innermost one 
among the occurrences that are ending together. In this ex- 
ample, this means we want to track the set of occurrences 
{/if, hi}. This can be done by simply omitting the expiry- 
time check in the NO-X algorithm. (That is, whenever an 
automaton reaches final state, irrespective of the span of the 
occurrence tracked by it, we increment frequency and retire 
all other automata except for the one in start state). We 



denote this as the NO-I algorithm and this is the algorithm 
proposed in [9]. 

In NO-I, if we only retire automata that reached their fi- 
nal states (rather than retire all automata except the one in 
the start state) , we have an algorithm for counting minimal 
occurrences (denoted MO). In our example, the automata 
tracking h%, h% and h% are the ones that reach their final 
states in this algorithm. The time-windows of these occur- 
rences constitute the set of all minimal windows of a in Di . 
Expiry time constraints can be incorporated by increment- 
ing frequency only when the occurrence tracked has span less 
than the expiry-time threshold. The corresponding expiry- 
time algorithm is referred to as MO-X. 

The windows-based counting algorithm (which we refer to 
as WB) is also based on tracking earliest transiting occur- 
rences. WB also uses multiple automata per episode to track 
minimal occurrences of episodes like in MO. The only dif- 
ference lies in the way frequency is incremented. The algo- 
rithm essentially remembers, for each candidate episode, the 
last minimal window in which the candidate was observed. 
Then, at each time tick, effectively, if this last minimal win- 
dow lies within the current sliding window of width Tx, 
frequency is incremented by one. This is because, an occur- 
rence of episode a exists in a given window w if and only w 
contains a minimal window of a. 

It is easy to see that head frequency with a window-width of 
Tx is simply the number of earliest transiting occurrences 
whose span is less than Tx- Thus we can have a head fre- 
quency counting algorithm (referred to here as HD) that 
is similar to MO-X except that when two automata reach 
the same state simultaneously we do not remove the older 
automaton. This way, HD will track all earliest transiting 
occurrences which satisfy an expiry time-constraint of Tx- 
For Tx — 10 and for episode a, HD tracks h%, h%, h% and 
hg and returns a frequency count of 4. The total frequency 
count for an episode a is the minimum of the head frequen- 
cies of all its subepisodes (including itself). This can be 
computed as the minimum of the head frequency of a and 
the total frequency of its (JV — l)-suffix subepisodes which 
would have been computed in the previous pass over the 
data. (See [7] for details). The head frequency counting 
algorithm can have high space-complexity as all the time 
instants at which automata make their first state transition 
need to be remembered. 

The non-interleaved frequency counting algorithm (which 
we refer to as NI) differs from the minimal occurrence algo- 
rithm in that, an automaton makes a state transition only 
if there is no other automaton of the same episode in the 
destination state. Unlike the other frequency counting algo- 
rithms discussed so far, such an FSA transition policy will 
track occurrences which are not necessarily earliest tran- 
siting. In our example, until the event (A, 4) in the data 
sequence, both the minimal and non-interleaved algorithms 
make identical state transitions. However, on (^4, 5), NI will 
not allow the automaton in state (0, ^4) to make a state 
transition as there is already an active automaton for a in 
state (1,-B) which had accepted (^4,4) earlier. Eventually, 
NI tracks the occurrences hf = [13 715], hf = [4 91618], 
hf = [1417 20 23] and hf = [19 21 25 29] . 
While there are no algorithms reported for counting dis- 
tinct occurrences, we can construct one using the same ideas. 
Such an algorithm (to be called as DO) differs from the one 
for counting minimal occurrences, in allowing multiple au- 



tomata for an episode to reach the same state. However, on 
seeing an event (Et, ti) which multiple automata can accept, 
only one of the automata (the oldest among those in the 
same state) is allowed to make a state transition; the others 
continue to wait for future events with the same event-type 
as Ei to make their state transitions. The set of maximal 
distinct occurrences of a in Di are h\ = h\, h,2 = [4 9 11 18], 
hi = [517 20 23], hi = [14 2125 29] and hi = [19 24 30 31] 
which are the ones tracked by this algorithm. 
We can also consider counting all occurrences of an episode 
even though it may be inefficient. The algorithm for count- 
ing all occurrences (referred to as the AO) allows all au- 
tomata to make transitions whenever the appropriate events 
appear in the data sequence. However, at each state transi- 
tion, a copy of the automaton in the earlier state is added 
to the set of active automata for the episode. 
From the above discussion, it is clear that by manipulating 
the FSA (that recognize occurrences) in different ways we 
get counting schemes for different frequencies. The choices 
to be made in different algorithms essentially concern when 
to initiate a new automaton in the start state, when to re- 
tire an existing automaton, when to effect a possible state 
transition and when (and by how much) to increment the fre- 
quency. We now present a unified scheme incorporating all 
this in Algorithm\Jiior obtaining frequencies of a set of serial 
episodes. This algorithm has five boolean variables, namely, 
TRANSIT, COPY-AUTOMATON, JOIN-AUTOMATON, 
INCREMENT-FREQ and RETIRE-AUTOMATON. The count- 
ing algorithms for all the different frequencies are obtained 
from this general algorithm by suitably setting the values of 
these boolean variables (either by some constants or by val- 
ues calculated using the current context in the algorithm). 
Tables [2] - [6] specify the choices needed to obtain the algo- 
rithms for different frequencies. (A list of all algorithms is 
given in table [T}. 

As can be seen from our general algorithm, when an event 
type for which an automaton is waiting is encountered in the 
data, the the automaton can accept it only if the variable 
TRANSIT is true. Hence for all algorithms that track earli- 
est transiting occurrences, TRANSIT will be set to true as 
can be seen from table [2] For algorithms NI and DO where 
we allow the state transition only if some condition is satis- 
fied. The condition COPY- AUTOMATON (Table [3} is for 
deciding whether or not to leave another automaton in the 
current state when an automaton is transiting to the next 
state. Except for NO and AO, we create such a copy only 
when the currently transiting automaton is moving out of 
its start state. In NO we never make such a copy (because 
this algorithm uses only one automaton per episode) while 
in AO we need to do it for every state transition. As we have 
seen earlier, in some of the algorithms, when two automata 
for an episode reach the same state, the older automaton 
is removed. This is controlled by JOIN-AUTOMATON, as 
given by Table H INCREMENT-FREQUENCY (Table [5j 
is the condition under which the frequency of an episode is 
incremented when an automaton reaches its final state. This 
increment is always done for algorithms that have no expiry 
time constraint or window width. For the others we incre- 
ment the frequency only if the occurrence tracked satisfies 
the constraint. RETIRE- AUTOMATA condition (Table E} 
is concerned with removal of all automata of an episode when 
a complete occurrence has been tracked. This condition is 
true only for the non-overlapped occurrences-based counting 



algorithms. 

Apart from the five boolean variables explained above, our 
general algorithm contains one more variable, namely, INC, 
which decides the amount by which frequency is incremented 
when an automaton reaches the final state. Its values for 
different frequency counts are listed in Table [7] For all 
algorithms except WB, we set INC — 1. We now ex- 
plain how frequency is incremented in WB. To count the 
number of sliding windows that contain at least one oc- 
currence of the episode, whenever a new minimal occur- 
rence enters a sliding window, we can calculate the number 
of consecutive windows in which this new minimal occur- 
rence will be found in. For example, in Di , with a window- 
width of Tx = 16, consider the first minimal occurrence of 
[A — > B — ► C — > D), namely, the occurrence constituted by 
events (A, 5), (.8,9), (C, 11) and (L>,15). The first sliding 
window in which this occurrence can be found is [—1,15]. 
The occurrence stays in consecutive sliding windows, until 
the sliding window [5,21]. When this first minimal occur- 
rence enters the sliding window [— 1, 15], we observe that 
there is no other 'older' minimal occurrence in [—1, 15], and 
hence, as per the else condition in Tabled the INC is in- 
cremented by (5 — ( — 1) + 1) = 7. Similarly, when the second 
minimal occurrence enters the sliding window [7, 23], we in- 
crement INC by (14 - 7 + 1 = 8). The third minimal oc- 
currence (constituted by the events (A, 22), (B,24), (C, 25) 
and (D, 29)) first enters the sliding window [13, 29], with the 
second minimal window still occurring within this window. 
This third minimal occurrence remains in consecutive slid- 
ing windows until [22,38]. As per the if condition of Table 
INC is incremented by 22 — 14 = 8. We note that such 
an implementation of the windows-based algorithm removes 
the need for the beginsat(t) list of [14] which was used to 
store all automata whose first state transition occurred at 
time-tick t. 

Remark 1. Even though we included AO (for counting 
all occurrences of an episode) for sake of completeness, this 
is not a good frequency measure. This is mainly because it 
does not seem to satisfy any anti-monotonicity condition. 
For example, consider the data sequence < AABBCC >. 
There are 8 occurrences of (A —t B —t C) but only 4 oc- 
currences of each of its 2-node subepisodes. Also, its space 
complexity can be high. 

Remark 2. : The quantitative relationships between the 
different frequency counts for a given episode can be de- 
scribed as follows: 



fall > fh > ftot > fd > fni > fmi > fn 



(3) 



where f a u denotes the frequency of an episode under AO, 
while fh and ftot denote the corresponding head and total 
frequencies defined with a window-width exceeding the total 
time-span of the event sequence. For a large sliding win- 
dow width, the head frequency fh is same as the number of 
earliest transiting occurrences of an episode. In general, the 
inequality fd > fni holds only for infective episodes (An 
episode a is injective if it does not contain any repeated 
event-types). All other inequalities are true for any serial 
episode. The first inequality is obvious. The second inequal- 
ity follows directly from equation [2] in definition |4j Given a 
set of f maximal distinct occurrences of an episode a in a 
data stream D, one can extract that many earliest transiting 



Table 1: Various frequency counts 



WB 


Windows based 


MO 


Minimal Occurrences based 


MO-X 


Minimal Occurrence with Expiry time constraints 


NO 


Non-overlapped 


NO-I 


Non-overlapped innermost 


NO-X 


Non-overlapped with Expiry time constraints 


NI 


Non- interleaved 


DO 


Distinct occurrences based 


AO 


All occurrences based 


HD 


Head frequency 



Table 2: Conditions for TRANSIT=TRUE 



WB, MO, MO-X, HD 
NO, NO-X, NO-I AO 


Always 


NI 


If $ earlier automaton for a 
in next state j 


DO 


No other earlier automaton for a 
waiting in same state can 
transit on event (Ei,ti). 



Table 3: Conditions for COPY-AUTOMATON=TRUE 



WB, MO, MO-X, HD 
NI, NO-X, NO-I, DO 


Only if A 
is in start state 


NO 


Never 


AO 


Always 



Table 4: Conditions for JOIN-AUTOMATON=TRUE 



WB, MO, MO-X, 
NO-X, NO-I 



DO, AO, HD, NO, NI 



Always 



Never 



Table 5: Conditions for INCREMENT-FREQ=TRUE 



MO, NO, NI, 
DO, AO, NO-I 


Always 


WB, NO-X 
MO-X, HD 


If time difference between 

first and last state transitions 

is less than Tx (window- width for 

WB, expiry time for others) 



Table 6: Conditions for RETIRE-AUTOMATA=TRUE 



NO, NO-X, NO-I 


Always 


WB, MO, MO-X 

HD, NI, DO, AO 

MO-X 


Never 



Table 7: Values taken by INC 



INC = 1 for all counts except WB. 

For Windows Based count (WB), 

If(first window which contains current minimal 

occurrence also contains the previous minimal 

occurrence), then 

INC = Time diff. between start of last window containing 

the current minimal occurrence and the start of last 

window which contains previous minimal occurrence. 

else 

INC=time difference between the first and last window 

containing the current occurrence +1. 



Algorithm 1 Unified Algorithm for counting serial episodes 
Input: Set Cn of JV-node serial episodes, event stream D = 

((Ei,ti),...,(E n ,t n ))), 
Output: Frequencies of episodes in Cn 

1: for all a £ Cn do 

2: Add automaton of a waiting in the start state. 

3: Initialize frequency of a to ZERO. 

4: for I — 1 to n do 

5: for each automaton, A, ready to accept event-type Ei 
do 

6: a=candidate associated with A; 

7: j = state which A is ready to transit into; 

8: if TRANSIT then 

9: if COPYAUTOMATON then 

10: Add Copy of A to collection of automata. 

11: Transit A to state j 

12: if 3 an earlier automaton of a already in state j 

but not waiting for Ei then 
13: if JOIN-AUTOMATON then 

14: Retain A and retire earlier automaton 

15: if A reached final state then 

16: Retire A. 

17: if INCREMENT-FREQ then 

18: Increment frequency of a by INC. 

19: if RETIRE-AUTOMATON then 

20: Retire all automaton of a and create a 

state '0' automaton. 



occurrences of not only a but also of all its subepisodes in 
D. Hence the third inequality is also true. Also, it is easy 
to verify that a set of non-interleaved occurrences of an in- 
fective episode are also distinct, which validates the fourth 
inequality. We will show the correctness of the remaining 
two inequalities in the next section. 

4. PROOFS OF CORRECTNESS 

In this section, we present proofs of correctness of the dif- 
ferent frequency counting algorithms presented in Sec. [3] (all 
of which are specific instances of Algorithm^ . 
In our proofs, we consider the case of event sequences with 
distinct occurrence-times for events. When we are not con- 
sidering expiry-time constraints, the actual values of times 
of occurrences of different events are not really important; 
only the time ordering of the events is important in deciding 
on the occurrences of episodes. Hence, in this section we will 
use h(vi) interchangeably with t h ( vi ), the time of the first 
event in the occurrence h and so on. Modifications needed 
in the case of data having multiple events with the same 
time of occurrence, are discussed at the end of the section. 

4.1 Minimal Window Counting algorithm 

First, we analyze the minimal occurrences counting algo- 
rithm (MO). Our proof methodology is different from the 
one presented in [4] , where, the algorithm is viewed as com- 
puting a table S[0 . . . n, ... N], where, S[i, j] is the largest 
value k < i such that Eu ■ ■ ■ Ei contains an occurrence of 
«[1] — >■ ...a[j], using dynamic programming. The algo- 
rithm, after processing Ei, stores the i th row of this ma- 
trix. The dynamic programming recursion helps compute 
the i th row of this matrix from its (i — l) th row. Whenever 
S[i, N] > S[i — 1,N], the count is incremented since a new 



minimal occurrence is recognized. Viewed from an automata 
perspective, the i th row of the matrix essentially stores the 
first state transition times of the currently active automata. 
Our analysis of the minimal occurrence algorithm also leads 
to an analysis and proof for counting non-overlapped occur- 
rences (NO and NO-X) as well. Another advantage of our 
proof strategy is that it may be generalized to the case of 
episodes with general partial orders. (We briefly discuss this 
in section [6}. 

Lemma 1. Suppose h is an earliest transiting occurrence 
of an N-node episode a. Ifh' is any general occurrence such 
that h <* h', then h(vi) < h'(Vi) Vi = 1, 2, ... N. 

This lemma follows easily from the definition of the lexico- 
graphic ordering, <*, and the definition of earliest transiting 
occurrence. 

Remark 3. Recall that hi is the i th earliest transiting 
(ET) occurrence of an episode. Thus, by definition, /if(t?i) < 
/i)(i>i) and /if <* hj whenever i < j. Hence, from the 
above lemma, we have /if (v*,) < hj(Vh) for all k and i < j. 
In particular, we have, /jf(ui) < /if + i(ui) and /if(t>jv) < 
/if_l_i(v/v), for an N-node episode. 

The main idea of our proof is that to find all minimal win- 
dows of an episode, it is enough to capture a certain subset 
of earliest transiting occurrences. 

Lemma 2. An earliest transiting (ET) occurrence h\, of 
an N-node episode, is not a minimal occurrence if and only 
ifhi(v N ) = ht +1 (v N ). 

Proof. The 'if part follows easily from Remark [3] For 
the 'only if part, let us denote by w = [n s ,n e ] — [hf (vi), hi (vn)] 
the window of /if. Given that w is not a minimal window, 
we need to show that /if(vjv) = /if + i(«jv). Since w is not a 
minimal window, one of its proper sub- windows contains an 
occurrence, say, h, of this episode. That means if h starts at 
n 3 then it must end before n e . But, since /if is earliest tran- 
siting, any occurrence starting at the same event as /if can 
not end before /if. Thus we must have h(vi) > /if (fi). This 
means, by lemma[l] since /if is earliest transiting, we can not 
have /if (vn) > /i(ujv). Since the window of h has to be con- 
tained in the window of /if, we thus have /if(«jv) = /i(vjv). 
By definition, /if +1 will start at the earliest possible posi- 
tion after /if. Since there is an occurrence starting with 
h(vi) we must have /if +1 (ui) < h(vi). Now, since /if + i is 
earliest transiting, it can not end after h. Thus we must 
have /if +1 (ujv) < /i(fjv). Also, /if +1 can not end earlier than 
/if because both are earliest transiting. Thus, we must have 
/if (vn) = /if+i(^jv). This completes proof of lemma. □ 

Remark 4. This lemma shows that any ET occurrence 
hi such that /if(njv) < /if+i(fjv) is a minimal occurrence. 
The converse is also true. Consider a minimal window w = 
[n s ,n e ]. Since this is a minimal window, there is an occur- 
rence (and hence an ET occurrence) starting at n a . Denote 
this ET occurrence by /if. We know /if(wjv) = n e because 
W is a minimal window. Then the next ET occurrence /if + i 
has to start after n s and has to end beyond n e because w is 
minimal. Thus we have /if(iijv) < /if +1 (wjv). 

Now we are ready to prove correctness of the MO algorithm. 
Consider Algorithm^ operating in the MO(minimal occur- 
rence) mode for tracking occurrences of an JV-node episode 



a. Since TRANSIT is always true in the MO mode, all 
automata would be tracking ET occurrences. Since COPY- 
AUTOMATON is true in MO mode whenever an automa- 
ton transits out of start state, we will always have an au- 
tomaton in the start state. This, along with the fact that 
TRANSIT is always true, implies that the i th initialized au- 
tomaton would be tracking /if, the i ET occurrence. Let us 
denote by Af the i th initialized automaton. However, since 
JOIN- AUTOMATON is also always true, not all automata 
(initialized for this episode) would result in incrementing 
the frequency; some of them would be removed when one 
automaton transits into a state already occupied by some 
other automaton. In view of Lemma [2] and Remark [4] if 
we show that the automaton Af results in increment of fre- 
quency if and only if hi, the occurrence tracked by it, is 
such that hi (vn) < ft?+i(vjv)> then, the proof of correctness 
of MO algorithm is complete. 

Lemma 3. In the MO algorithm the i th automaton that 
was initialized for a, referred to as Af , contributes to the 
frequency count iff Tif(tijv) < /if+i(i>iv). 

Proof. 

Af does not contribute to the frequency 
=> Af is removed by a more recently initialized automaton 
=> 3 Af., k > i, which transits into a state 

already occupied by Af 
=> 3k, j s.t.k> i, 1 <j < Nandhi(vj) — h%(vj). 
=> 3jl<j<Ns.t.h<:(v ] ) = hi +1 (v J ). 
because, by Remark[3] for k > i, 

«(«,-) <AJfi(*i)< ft* («i),Vj 
=*■ ftf(vjv) = ht +1 (v N ) 

The last step follows because both h\ and /if +1 are ET oc- 
currences and hence hf(vj) — hf +l (vj) implies /if(iy) = 
ftf+i(ty), V/ >j. 
Conversely, we have 

Af contributes to the frequency 

=► Vj, Kj<N,K(v J )<hf +1 (v ] ) 
=> ht(v N ) < ft|+i(wjv). 

The first step follows because, if Af contributes to the fre- 
quency then no automaton initialized after it would ever 
come to the same state occupied by it and since all oc- 
currences tracked are earliest transiting, this must mean 
hf(vj)<hf +1 (vj),yj. This completes proof of the lemma. □ 

Another interesting observation is that if /if is minimal, 
then it is non-interleaved with /if + i. Suppose /if is mini- 
mal and /if is not non-interleaved with hf +1 . Since /if is 
minimal, we have hf(vji) < /if+i(iy), Vj'. If /if is not 
non-interleaved with hf +1 , there exists a j < N such that 
hi + i(vj) < hi(Vj+i). Thus we must have /if (vj ) < /if +1 (wj) < 
hi(vj+i) < /if + i(fj+i). But this can not be because E h e^ v ^ 
is the earliest a[j + 1] after hl(vj) and if it is also after 
hf + i(vj) then the fact that both /if and /if +1 are ET oc- 
currences should mean hf(vj+i) = hf +1 (vj+i) which con- 
tradicts that /if is minimal. Hence /if and /if +1 are non- 
interleaved. 

Thus, given the sequence of minimal windows, the earliest 
transiting occurrences from each of these minimal windows 
gives a sequence of (same number of) non-interleaved occur- 
rences. This leads to / m ; < /„; as stated earlier in <(3j . 



4.2 Other ET occurrences-based algorithms 

4. 2. 1 Proofs of correctness for NO-X and NO-I 

The NO-X algorithm can be viewed as a slight modifica- 
tion to the MO algorithm. As in the MO algorithm, we 
always have an automaton in the start state and all au- 
tomata make transitions as soon as possible and when an 
automaton transits into a state occupied by another, the 
older one is removed. However, in the NO-X algorithm, the 
INCREMENT-FREQ variable is true only when we have an 
occurrence satisfying Tx constraint. Hence, to start with, 
we look for the first minimal occurrence which satisfies the 
expiry time constraint and increment frequency. At this 
point, (unlike in the MO algorithm) we terminate all au- 
tomata except the one in the start state since we are try- 
ing to construct a non-overlapped set of occurrences. Then 
we look for the next earliest minimal occurrence (which 
will be non-overlapped with the first one) satisfying expiry 
time constraint and so on. Since minimal occurrences lo- 
cally have the least time span, this strategy of searching 
for minimal occurrences satisfying expiry time constraint 
in a non-overlapped fashion is quite intuitive. Let H„x = 
{h" x , /i 2 l x ■ ■ ■ hJl x } denote the sequence of occurrences tracked 
by the NO-X algorithm (for an TV-node episode). Then the 
following property of H„x is obvious. 

Property 1. hi X is the earliest minimal occurrence sat- 
isfying expiry time constraints. h" x is the first minimal oc- 
currence (satisfying expiry time constraint) that starts after 
/i"ii(wjv). There is no minimal occurrence satisfying expiry 
time constraint which starts after h^ x (vn)- 

Theorem 1. H n x is a maximal non-overlapped sequence 
satisfying expiry time constraint Tx. 

Proof. Consider any other set of non-overlapped occur- 
rences satisfying expiry constraints, H' — {h[, h% . . . /ij} such 
that hi <* h' i+1 . Let m = min{f',l}. Then we first show 

hf (vn) < ftj(ww) Vi = 1, 2, ... m. 

Suppose /li(vjv) < /ii X (fjv). Consider the earliest transit- 
ing occurrence h" starting from h'i(vi). This ends at or 
before /li(wjv) by lemma [T] Among all ET occurrences that 
end at the same event as h" , the last one (under the lex- 
icographic ordering) is a minimal occurrence by lemma [5] 
Its window is contained in that of h[ which satisfies the 
expiry time constraint. Hence we have found a minimal 
occurrence satisfying expiry constraint ending before h" x 
which contradicts the first statement of property [T] Hence 
h™ x (vn) < /i'i(ujv). Now applying the same argument to 
the data stream starting with the first event after h" x (wjv), 
we get h,2 X (vn) < /i2(fjv) and so on and thus can con- 
clude hf x (vn) < /li(ujv) Vi. This shows that no other set 
of non-overlapped occurrences can have more number of oc- 
currences than those in H n x- Hence, H n x is maximal. □ 

If we choose Tx equal to the time span of the data stream, 
the NO-X algorithm reduces to the NO-I algorithm because 
every occurrence satisfies expiry constraint. Hence proof of 
correctness of NO-I algorithm is immediate. 

4.2.2 Relation between NO-I and NO algorithms 

We now explain the relation between the sets of occurrences 
tracked by the NO and NO-I algorithms. As proved in [10] 



the NO algorithm (which uses one automaton per episode), 
tracks a maximal non-overlapped sequence of occurrences, 
say, H no = {hT,h%° ...h]°J. Since the NO-I algorithm 
has no expiry time constraint, it also tracks a maximal set 
of non-overlapped occurrences. Among all the ET occur- 
rences that end at /i™°(«at), let h™ be the last one (as per 
the lexicographic ordering). Then the i th occurrence tracked 
by the NO-I algorithm would be h\ n as we show now. Since 
hi" would be the first ET occurrence, it is clear from our 
discussion in the previous subsection that the first occur- 
rence tracked by the MO algorithm would be hf 1 . As is 
easy to see, the MO and NO-I algorithms would be iden- 
tical till the first time an automaton reaches the accepting 
state. Hence K\ would be the first occurrence tracked by 
the NO-I algorithm. Now the NO-I algorithm would remove 
all automata except for the one in the start state. Hence, it 
is as if we start the algorithm with data starting with the 
first event after K^°(vm) = /ii n (i>iv). Now, by the property 
of NO algorithm, h™ would be the first ET occurrence in 
this data stream and hence h 2 " would be the first minimal 
window here. Hence it is the second occurrence tracked by 
NO-I and so on. 

The above also shows that each occurrence tracked by the 
NO-I algorithm is also tracked by the MO algorithm and 
hence we have f no < fmi as stated in (J3]). Hi n is also a max- 
imal set of non-overlapping minimal windows as discussed 
in [21] ■ 

4.3 Non-interleaved and Distinct Occurrences 
based Algorithms 

The algorithm NI which counts non-interleaved occurrences 
is different from all the ones discussed so far because it does 
not track ET occurrences. Here also we always have an au- 
tomaton waiting in the start state. However, the transitions 
are conditional in the sense that the i th created automa- 
ton makes a transition from state (j — 1) to j provided the 
(i — l) th created automaton is past state j after processing 
the current event. This is because we want the i th automata 
to track an occurrence non-interleaved with the occurrence 
tracked by (i-l) th automaton. Let H ni = {hf, hf , . . . hf,} 
be the sequence of occurrences tracked by NI. From the 
above discussion it is clear that it has the following property 
(while counting occurrences of a). 

Property 2. h"' is the first or earliest occurrence (of 
a). For all i > 1 and Vj = 1, . . . , N - 1, hf(Vj) is the 
first occurrence of a[j] at or after h"l 1 (vj+i); and h™(vN) 
is the earliest occurrence of a [TV] after h" l (vN-i). There is 
no occurrence of a beyond h 1 ) which is non-interleaved with 
it. 

The proof that H n i is a maximal non-interleaved sequence 
is very similar in spirit to that of the NO-X algorithm. As 
earlier, we can show that given an arbitrary sequence of 
non-interleaved occurrences H' — {h[, h' 2 . . . h' { }, we have 
h™(yk) < h'^Vk), Vi,fc and hence get the correctness proof 
of NI algorithm. It is easy to verify the correctness of the 
DO algorithm also along similar lines. 

It appears difficult to extend both the NI and DO algorithms 
to incorporate expiry time constraints. For this we should 
track a set of occurrences hi,h%... of a, where hi is the 
first occurrence satisfying Tx and h 2 is the next earliest 
occurrence satisfying Tx that is non-interleaved with (or 



distinct from, in case of DO) hi and so on. Note that this h 2 
need not have to be the earliest occurrence non-overlapped 
with hi. At present, there are no algorithms for counting 
non-interleaved or distinct occurrences satisfying an expiry 
time constraint. 

Before ending this section, we briefly outline what needs 
to be done when the data stream contains multiple events 
having the same time of occurrence. An important thing 
to note is that two events having the same time of occur- 
rence cannot be a part of a serial episode occurrence. Hence, 
each automata can at most accept one event from a set of 
events having the same occurrence time. With this condi- 
tion, the DO, AO and HD algorithms go through as before. 
One would need to process the set of events having the same 
occurrence time together and allow all the permissible au- 
tomata to make a one step transition first as done using 
transitionsQ list in [14] , After this, before processing the 
set of events with the next occurrence time, we would need 
to do the multiple automata check for the various candidate 
episodes and delete the appropriate older automata for algo- 
rithms MO, MO-X, NO-I and NO-X. For the non-interleaved 
algorithm, one needs to actually back track the transitions 
which resulted in two automata to coalesce. 

5. CANDIDATE GENERATION 

In this section, we discuss the anti-monotonicity properties 
of the various frequency counts, which in-turn are exploited 
by their respective candidate generation steps in the Apriori- 
style level-wise procedure for frequent episode discovery. 
It is well known that the windows-based [H] , non-overlapped 
[9] and total [7] frequency measures satisfy the anti-monotonicity 
property that all subepisodes of a frequent episode are fre- 
quent. One can verify that the same holds for the distinct 
occurrences based frequency too. It has been pointed out 
in [7] that the head frequency does not satisfy this anti- 
monotonicity property. For an episode a, in general, only 
the subepisodes involving «[1] are as frequent as a under 
the head count. In a level- wise apriori-based episode discov- 
ery, the candidate generation for the head frequency count 
would exploit the condition that if an TV-node episode is fre- 
quent, then all (TV — l)-node subepisodes that include a[l] 
have to be frequent. The head frequency definition has some 
limitations in the sense that the frequency of the (TV — 1)- 
node suffix subepisodqj can be arbitrarily low. Consider the 
event stream with 100 As followed by a B and C. Suppose 
all occurrences of A — > B — ¥ C satisfy the expiry constraint 
Tx ■ Even though there are 100 occurrences of A —¥ B — > C, 
there is only one occurrence of B — ► C . This can be a 
problem when one desires that the frequent episodes cap- 
ture repetitive causative influences. 

Like the head frequency, the minimal occurrences (windows) 
and the non-interleaved occurrences also do not satisfy the 
anti-monotonicity property that all subepisodes are at least 
as frequent as the corresponding episode. However, the 
(TV — l)-node prefix and suffix subepisodes are at least as 
frequent as the episode as we show below. For an example, 
consider a data stream where successive events are given by 
ABACBDCD. Even though there are two minimal win- 



2 Given an TV-node episode q[1] — > a [21 — > ■ 
K-node prefix subepisode is a[l] — ► a [2] — > ■ 
its (TV — iH-node suffix subepisode is a[iv+l] 
>a[N] for K = 1,2, ■•• , (TV - 1). 



-> a [TV], its 
— > a[fc] and 

a\K+2] -*■ 



dows (and two non-interleaved occurrences) of A — s* B — > 
C — 5- D, there is only one minimal window (and one non- 
interleaved occurrence) of each of the non-prefix and non- 
suffix subepisodes A — s> B — >• D and A — >• C — >• D. How- 
ever, the situation here is not as bad as that for head fre- 
quency because all such subepisodes will have at least as 
many distinct occurrences as the number of minimal or non- 
interleaved occurrences of the episode, at least in case of in- 
jective episodes. (Note that this example is that of an infec- 
tive episode). This is because, in case of injective episodes, 
the number of distinct occurrence is always greater than 
the non- inter leaved count, which in-turn is greater than the 
minimal windows count. Hence, given that there are / non- 
interleaved or minimal occurrences of an injective episode 
a, there are at least / distinct occurrences of a too. Since 
the distinct occurrences based frequency satisfies the origi- 
nal anti-monotonicity property, all subepisodes of a too will 
have at least / distinct occurrences. 

We now formally prove the anti-monotonicity property for 
minimal and non-interleaved occurrences based frequencies. 



Theorem 2. If a N-node serial episode a has a frequency 
f in the minimal or the non-interleaved sense, then its (TV — 
l)-prefix subepisode (dtp) and suffix subepisode (oi s ) have a 
frequency of at least f . 

Proof. Consider a minimal window of the episode a, 
w = [n s ,n e ]. Consider the earliest occurrence h p of the 
prefix subepisode starting from n s and let w' be its window. 
Any proper sub-window of w' starting at n s and containing 
an occurrence of a p contradicts lemma [T] A proper sub- 
window of w' containing an occurrence of a p starting after 
n s would contradict the minimality of w itself. Hence w' is 
a minimal window of a p starting at n s . We hence conclude 
that a p has a frequency of at least /. A similar proof works 
for the suffix subepisode by considering the window of the 
last occurrence h„ of the suffix subepisode ending at n e . 
Let T-Lni = {hi, hi, . . . hf} be a maximal non-interleaved 
sequence. From each occurrence hk, we choose the sub- 
occurrence h' k = [hk{vi),hk{v2), ■ ■ ■ hk(VN-i], of a p . ft is 
easy to see that this new sequence of occurrences h'i , hl-i , . . . h't 
forms a non-interleaved sequence. Hence the frequency of 
a p is at least /. A similar argument works for the suffix 
episode. □ 

Hence, for every episode a, we extract its (TV — 1) suffix, go 
down the candidate list and search for a block of episodes 
whose TV — f prefix matches this suffix. We form candi- 
dates as many as the number of episodes in this matching 
block. This kind of candidate generation has already been 
reported in the literature in [20] , [18] and [19] in the context 
of sequences under inter-event time constraints. 

6. DISCUSSION AND CONCLUSIONS 

The framework of frequent episodes in event streams is a 
very useful data mining technique for unearthing tempo- 
ral dependencies from data streams in many applications. 
The framework is about a decade old and many different 
frequency measures and associated algorithms have been 
proposed over the last ten years, fn this paper we have 
presented a generic automata-based algorithm for obtain- 
ing frequencies of a set of candidate episodes. This method 



unifies all the known algorithms in the sense that we can 
particularize our algorithm (by setting values for a set of 
variables) for counting frequent episodes under any of the 
frequency measures proposed in literature. 
As we showed here, this unified view gives useful insights 
into the kind of occurrences counted under different fre- 
quency definitions and thus also allows us to prove relations 
between frequencies of an episode under different frequency 
definitions. Our view also allows us to get correctness proofs 
for all algorithms. We introduced the notion of earliest tran- 
siting occurrences and, using this concept, are able to get 
simple proofs of correctness for most algorithms. This has 
also allowed us to understand the kind of anti-monotonicity 
properties satisfied by different frequency measures. 
While the main contribution of this paper is this unified view 
of all frequency counting algorithms, some of the specific re- 
sults presented here are also new. The relationships between 
different frequencies of an episode (cf. eqn [3]), is proved 
here for the first time. The distinct-occurrences based fre- 
quency and an automata-based algorithm for it are novel. 
The specific proof of correctness presented here for minimal 
occurrences is also novel. Also, the correctness proofs for 
non-overlapped occurrences based frequency counting un- 
der expiry time constraint has been provided here for the 
first time. 

In this paper we have considered only the case of serial 
episodes. This is because, at present, there are no algorithms 
for discovering general partial orders under the various fre- 
quency definitions. However, all counting algorithms ex- 
plained here for serial episodes can be extended to episodes 
with a general partial order structure. We can come up with 
a similar finite state automata(FSA) which track the earliest 
transiting occurrences of an episode with a general partial 
order structure pQ. For example, consider a partial order 
episode (AB) — s* C which represents A and B occurring in 
any order followed by a C. fn order to track an occurrence 
of such a pattern, the initial state has to wait for either of 
A and B. On seeing an A it goes to state- 1 where it waits 
only for a 73; on the other hand, on seeing a B first it moves 
to state-2 where it waits only for an A. Then on seeing a 
B in state-1 or seeing a A in state-2 it moves into state-3 
where it waits for a C and so on. Thus, in each state in 
such a FSA, in general, we wait for any of a set of event 
types (instead of a single event for serial episodes) and a 
given state will now branch out into different states on dif- 
ferent event types. With such a FSA technique it is possible 
to generalize the method presented here so that we have 
algorithms for counting frequencies of general partial order 
episodes under different frequencies. The proofs presented 
here for serial episodes can also be extended for general par- 
tial order episodes. While it seems possible, as explained 
above, to generalize the counting schemes to handle general 
partial order episodes, it is not obvious what would be an 
appropriate candidate generation scheme for general partial 
order episodes under different frequency definitions. This is 
an important direction for future work. 
In this paper, we have considered only expiry time constraint 
which prescribes an upper bound on the span of the occur- 
rence. It would be interesting to see under what other time 
constraints (e.g., gap constraints), design of counting algo- 
rithms under this generic framework is possible. Also, some 
unexplored choice of the boolean conditions in the proposed 
generic algorithm may give rise to algorithms for new use- 



ful frequency measures. This is also a useful direction of 
research to explore. 
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